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LISTING OF THE CLAIMS: 

1 . (Currently Amended): A method for the segmentation of an audio stream into 
semantic or syntactic units wherein the audio stream is provided in a digitized format, 
comprising the steps of: 

determining a fundamental frequency for the digitized audio stream; 

detecting changes of the fundamental frequency in the audio strea m, wherein detecting 
the changes of the fundamental frequency includes providing a threshold value for 
estimates of the fundamental frequency's voicedness and determining whether the 
voicedness of the fundamental frequency estimates are higher or lower than the threshold 
value; 

determining candidate boundaries for the semantic or syntactic units depending on the 
detected changes of the fundamental frequency; 

extracting and combining a plurality of at least one prosodic f e ature features in the 
neighborhood of the candidate boundaries; ajid 

determining boundaries for the semantic or syntactic units depending on the combined 
plurality of at least oa eprosodic f e atur e features . 

2. (Canceled) 

3. (Currently Amended): The method according to claim [[2]] 1, wherein defining 
an index function for the fundamental frequency having a value = 0 if the voicedness of 
the fundamental frequency is lower than the threshold value and having a value = I if the 
voicedness of the fundamental frequency is higher than the threshold value. 
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4. (Currently Amended): The method according to claim 3, wherein extracting at 
l e ast on e the plurality of prosodic feature features is in an environment of the audio 
stream where the value of the index function is [[equal]] = 0. 

5. (Original): The method according to claim 4, wherein the environment is a time 
period between 500 and 4000 milliseconds. 

6. (Currently Amended): The method according to claim 1 , wherein [[the]] at least 
one prosodic feature is represented by the fundamental frequency. 

7. (Canceled) 

8. (Original): The method according to claim 1, further comprising first detecting 
speech and non-speech segments in the digitized audio stream and performing the steps 
of claim 1 thereafter only for detected speech segments. 

9. (Original): The method according to claim 8, wherein the detecting of speech and 
non-speech segments comprises utilizing the signal energy or signal energy changes, 
respectively, in the audio stream. 

10. (Original): The method according to claim 1, further comprising the step of 
performing a prosodic feature classification based on a predetermined classification tree. 

1 1 . (Currently Amended): An article of manufacture comprising a computer usable 
medium having computer readable program code means embodied therein for causing 
segmentation of an audio stream into semantic or syntactic units, wherein the audio 
stream is provided in a digitized format, the computer readable program code means in 
the article of manufacture comprising computer readable program code means for 
causing a computer to effect: 

determining a fundamental frequency for the digitized audio stream; 
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detecting changes of the fundamental frequency in the audio strea m, wherein detecting 
the chang es of the fundamental frequency inclu des providing a threshold value for 
estimates of the fundamental frequency's voicedness and determining whether the 
voicedness of the fundamental frequency estimates are higher or lower than the threshoJ[4 
value: 

determining candidate boundaries for the semantic or syntactic units depending on the 
detected changes of the fundamental frequency; 

extracting and combining a plurality of at least one prosodic featee features in the 
neighborhood of the candidate boundaries; anfl 

determining boundaries for the semantic or syntactic units depending on the combined 
plurality of at l e ast on e prosodic -fe ature features . 

12, (Currently Amended): A digital audio processing system for segmentation of a 
digitized audio stream into semantic or syntactic units comprising: 

means for determining a fundamental frequency for the digitized audio stream, 

means for detecting changes of the fundamental frequency in the audio stream , wherein 
detecting the c hanges of the fundamental frequency includes providing a threshold valug 
for estimates of the fundamental frequency's voicedness and determining whether the 
voicedness of the fundamental frequency estimates are higher or lower than the threshold 
value, 

means for determining candidate boundaries for the semantic or syntactic units depending 
on the detected changes of the fundamental frequency, 

means for extracting and combining a plurality of a fc- l e ast on e prosodic f e ature features in 
the neighborhood of the candidate boundaries, and 
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means for detemuning boundaries for the semantic or syntactic units depending on the 
combined plurality of at least ono prosodic f e ature features . 

13. (Original): An audio processing system according to claim 12 3 further comprising 
means for generating an index function for the voicedness of the fundamental frequency 
having a value = 0 if the voicedness of the fundamental frequency is lower than a 
predetermined threshold value and having a value = 1 if the voicedness fundamental 
frequency is higher than the threshold value. 

14, (Original): Audio processing system according to claim 1 2 or 1 3, further 
comprising means for detecting speech and non-speech segments in the digitized audio 
stream, particularly for detecting and analyzing the signal energy or signal energy 
changes, respectively , in the audio stream. 
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